MMDiff: Statistical Testing for ChIP-Seq data sets
نویسنده
چکیده
ChIP-Seq has rapidly become the dominant experimental technique to determine the location of transcription factor binding sites and histone modifications. Typically, computational peak finders, such as Macs [Zhang et al., 2008], are used to identify potential candidate regions, i.e. regions with significantly enriched read coverage compared to some background. In the following, we will simply call these regions peaks and will assume that their genomic coordinates are provided. Going beyond this basic analysis, it is often of interest to detect a subset of peaks where significant changes of read coverage occur in a treatment experiment relative to a control (see Figure 1). Statistical analysis of ChIP-Seq data however remains challenging, due to the highly structured nature of the data and the paucity of replicates. Current approaches to detect differentially bound regions are mainly borrowed from RNA-Seq data analysis, thus focusing on total counts of fragments mapped to a region, ignoring any information encoded in the shape of the peak profile. Higher order features of ChIP-Seq peak enrichment profiles carry important and often complementary information to total counts, and hence are potentially important in assessing differential binding. We therefore incorporate higher order information into testing for differential binding by adapting recently proposed kernel-based statistical tests to ChIP-Seq data.
منابع مشابه
Shape matters: Differential peak detection for Chip-seq data sets
Motivation and Objectives ChIP-Seq has rapidly become the dominant experimental technique in functional genomic and epigenomic research. Statistical analysis of ChIP-Seq data sets however remains challenging, due to the highly structured nature of the data and the paucity of replicates. Current approaches to detect differentially bound or modified genomic regions are mainly borrowed from RNA-Se...
متن کاملFeatures that define the best ChIP-seq peak calling algorithms
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an important tool for studying gene regulatory proteins, such as transcription factors and histones. Peak calling is one of the first steps in the analysis of these data. Peak calling consists of two sub-problems: identifying candidate peaks and testing candidate peaks for statistical significance. We surveyed 30 methods and ide...
متن کاملIdentifying Cell Type-Specific Transcription Factors by Integrating ChIP-seq and eQTL Data–Application to Monocyte Gene Regulation
We describe a novel computational approach to identify transcription factors (TFs) that are candidate regulators in a human cell type of interest. Our approach involves integrating cell type-specific expression quantitative trait locus (eQTL) data and TF data from chromatin immunoprecipitation-to-tag-sequencing (ChIP-seq) experiments in cell lines. To test the method, we used eQTL data from hum...
متن کاملA fully Bayesian hidden Ising model for ChIP-seq data analysis.
Chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) is a powerful technique that is being used in a wide range of biological studies including genome-wide measurements of protein-DNA interactions, DNA methylation, and histone modifications. The vast amount of data and biases introduced by sequencing and/or genome mapping pose new challenges and call for effective met...
متن کاملChIP-Enrich: gene set enrichment testing for ChIP-seq data
Gene set enrichment testing can enhance the biological interpretation of ChIP-seq data. Here, we develop a method, ChIP-Enrich, for this analysis which empirically adjusts for gene locus length (the length of the gene body and its surrounding non-coding sequence). Adjustment for gene locus length is necessary because it is often positively associated with the presence of one or more peaks and b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012